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MSH File : KALLIKREIN 

TITLE ; Novel Human Kallikrein-Like Genes 

FTKT.n OF TITF. INVENTION 

The invention relates to nucleic acid molecules, proteins encoded by such nucleic acid 
molecules; and use of the proteins and nucleic acid molecules 
RACKGROTTND OF THE INVENTION 

Kallikreins and kallikrein-like proteins are a subgroup of the serine protease enzyme 
family and exhibit a high degree of substrate specificity (1). The biological role of these 
kalhkreins is the selective cleavage of specific polypeptide precursors (substrates) to release 
peptides with potent biological activity (2). In mouse and rat, kallikreins are encoded by large 
multigene families. In the mouse genome, at least 24 genes have been identified (3). Expression 
of 11 of these genes has been confirmed; the rest are presumed to be pseudogenes (4). A similar 
family of 15-20 kallikreins has been found in the rat genome (5) where at least 4 of these are 
known to be expressed (6). 

Three human kallikrein genes have been described, i.e. prostatic specific antigen (PSA 
or KLK3) (7), human glandular kalUkrein (KLK2) (8) and tissue (pancreatic-renal) kallikrein 
(KLKl) (9). The PSA gene spans 5.8 Kb of sequence which has been published (7); the KLK2 
gene has a size of 5.2 Kb and its complete structure has also been elucidated (8). The KLKl 
gene is approximately 4.5 Kb long and the cxon sequences and the exon/intron junctions of this 
gene have been determined (9). 

The mouse kallikrein genes are clustered in groups of up to 1 1 genes on chromosome 
7 and the distance between the genes in the various clusters can be as small as 3-7 Kb (3). All 
three human kallikrein genes have been assigned to chromosome 19ql3.2 - 19ql3.4 and the 
distance between PSA and KLK2 has been estimated to be 12 Kb (9). 

A major difference between mouse and human kallikreins is that two of the human 
kallikreins (KLK2 and KLK3) are expressed ahnost exclusively in the prostate while in animals 
none of the kalhkreins is locaUzed in this organ. Other candidate new members of the human 
kallikrein gene family include protease M (10) (also named Zyme (11) or neurosin (12) and the 
nonnal epithehal cell-specific gene-1 (NESl) (13). Both genes have been assigned to 
chromosome 19ql3.3 (10,14) and show structural homology with other serine proteases and the 
kallikrein gene family (10-14). 
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SUMMARY OF THE INVENTION 

In effons to precisely define the relative genomic location of PSA, KLK2, Zyme and 
NESl genes, an area spanning approximately 300 Kb of contiguous sequence on human 
chromosome 19 (19ql3.3 ^13.4) was examined. The present inventor was able to identify the 
relative location of the known kallikrein genes and, in addition; he identified other putative 
kallikrein- like genes which exhibit both location proximity and structural similarity with the 
known members of the human kallikrein family. The novel genes .exhibit homology with the 
currently known members of the kallikrein family and they are co-localized in the same genomic 
region. These new genes, hke the already known kaliikreins have utility in various cancers 
including those of the breast and prostate. 

The kallikrein-like proteins described herein are individually referred to as "KLK-Ll to 
KLK-L8", and collectively as "kallikrein-like proteins" or "KLK-L Proteins". The genes 
encoding the proteins are referred to as klk-ll to Jdk-18 or kalUkrein-like genes or ^'klkA genes". 

Broadly stated the present invention relates to an isolated nucleic acid molecule which 
comprisesfi 

(i) a nucleic acid sequence encoding a protein having substantial sequence identity 
preferably at least 70% sequence identity, with an amino acid sequence of KLK- 
L1-KIX-L8 as shown in Tablese2 tp 9; 

(ii) a nucleic acid sequence encoding a protein comprising wdth an amino acid 
sequence of KLK-Ll -KLK-L8 as shown in Tables 2 to 9; 

(iii) nucleic acid sequences complementary to (i); 

(iv) a degenerate form of a nucleic acid sequence of (i); 

(v) a nucleic acid sequence capable of hybridizing under stringent conditions to a 
nucleic acid sequence in (i), (ii) or (iii); 

(vi) a nucleic acid sequence encoding a truncation, an analog, an allelic or species 
variation of a protein comprising widi an amino acid sequence of KLX-Ll- 
KLK-L8 as shown in Tables 2 to 9; or 

(vii) a fragment, or allelic or species variation of (i), (ii) or (iii). 

The invention also contemplates a nucleic acid molecule -comprising a sequence 
encoding a truncation of a KLK-L protein, an analog, or a homolog of a KLK-L Protein or a 
truncation thereof. (KLK-L Protein and truncations, analogs and homologs^ of the «KLK-L 
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Protein are also collectively referred to herein as "KLK-L Related Proteins"). 

The nucleic acid molecules of the invention may be inserted into an appropriate 
expression vector, i.e. a vector that contains the necessary elements for the transcripUon and 
translation of the inserted coding sequence. Accordingly, recombinant expression vectors 
5 adapted for transformation of a host cell may be constmcted which comprise a nucleic acid 
molecule of the invention and one or more transcription and translation elements linked to the 

nucleic acid molecule. 

The recombinant expression vector can be used to prepare transformed host cells 
expressing KLK-L Related Proteins. Therefore, the invention further provides host cells 
1 0 containing a recombinant molecule of the invention. The invention also contemplates transgenic 
non-human mammals whose germ cells and somatic cells contain a recombinant molecule 
comprising a nucleic acid molecule of the invention, in particular one which encodes an analog 
of the KLK-L Protein, or a truncation of the KLK-L Protein. 

The invention further provides a method for preparing KLK-L Related Proteins utilizing 
15 the purified and isolated nucleic acid molecules of the invention. In an embodiment a method 
for preparing a KLK-L Related Protein is provided comprising (a) transferring a recombinant 
expression vector of the invention into a host cell; (b) selecting transformed host cells from 
untiansformed host cells; (c) culturing a selected transformed host cell under conditions which 
allow expression of the KLK-L Related Protein; and (d) isolating the KLK-L Related Protein. 
2 0 The invention further broadly contemplates an isolated KLK-L Protein comprising the 

amino acid sequence as shown in Tables 2 to 9. 

The KLK-L Related Proteins of the invention may be conjugated with other molecules, 
such as proteins, to prepare fusion proteins. This may be accomplished, for example, by the 
synthesis of N-terminal or C-terminal fusion proteins. 
25 The invention further contemplates antibodies having specificity against an epitope of 

a KLK-L Related Protein of the invention. Antibodies may be labeled with a detectable 
substance and used to detect proteins of the invention in tissues and cells. 

The invention also permits the constiijction of nucleotide probes which are unique to the 
nucleic acid molecules of the invention and/or to proteins of the invention. Therefore, the 
30 invention also relates to a probe comprising a nucleic acid sequence of the invention, or a 
nucleic acid sequence encoding a protein of the invention, or a part thereof. The probe may be 
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labeled, for example, with a detectable substance and it may be used to select from a mixture 
of nucleotide sequences a nucleic acid molecule of the invention including nucleic acid 
molecules coding for a protein which displays one or more of the properties of a protein of the 
invention. 

The invention still further- provides a method for identifying^a substance which binds to 
a protein of the invention comprising reacting die protein with at least one substance which 
potentially can bind with the protein, under conditions which permit the formation of complexes 
between the substance and protein and assaying for complexes, for free substance, or for non- 
complexed protein. The invention also contemplates methods for identifying substances that 
bind to other intracellular proteins that interact with a KLK-L Related Protein. Methods can also 
be utilized which identify compounds which bind to KLK-L gene regulatory sequences (e.g. 
promoter sequences). 

Still further the invention provides a method for evaluating a compound for its ability 
to modulate the biological activity of a KLK-L Related Protein of the invention. For example 
a substance which inhibits or enhances die interaction of the protein and a substance which 
binds to the protein may be evaluated. In an embodiment, the method comprises providing a 
known concentration of a KLK-L Related Protein, with a substance which binds to die protein 
and a test compound undeneonditions which permit the formation of complexes between die 
substance and protein, and removing and/or detecting complexes. 

Compounds which modulate die biological activity of a protein of the invention may 
also be identified using -die methods of die invention by comparing the pattern and level of 
expression of the protein of die invention in tissues and cells, in die presence, and in die absence 
of the compounds. 

The substances and compounds identified using die metiiods of the invention, and 
peptides of die invention may be used to modulate die biological activity of a KLK-L Related 
Protein of die invention, and tiiey may be used in die treatment of conditions such as cancer (e.g. 
breast and prostate cancer). Accordingly, die substances and compounds may be formulated into 
compositions for administration to individuals suffering from cancer. 

Therefore, die present invention also relates to a composition^comprising oneior more 
of a protein of die invention, a peptide of die invention, or a substance or compound identified 
using die mediods of the invention, and a pharmaceutically acceptable carrier, excipient or 
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diluent. A method for treating or preventing cancer is also provided comprising administering 
to a patient in need thereof, a KLK-L Related Protein of the invention, or a composition of the 
invention. 

Other objects, feamres and advantages of tiie present invention will become apparent 
from the following detailed description. It should be understood, however, that the detailed 
description and the specific examples while indicating preferred embodiments of the invention 
are given by way of illustration only, since various changes and modifications within the spirit 
and scope of the invention will become apparent to those skilled in tiie art from this detailed 
description. 

ttRTEF DESCRIPTION OF THE DRAWINGS 

The invention will now be described in relation to the drawings in which: 
Fig. L An approximate 300 Kb of contiguous genomic sequence around chromosome 19ql3.3- 
ql3.4 represented by 9 contigs, each one shown with its length in Kb. The contig numbers refer 
to those reported in the Lawrence Uvermoie National Laboratory website. Note the localization 
of the four known genes (PSA, KLK2, Zyme, NESl). All genes are represented witii arrows 
denoting the direction of the coding strand. The position of the stratum comeum chymotryptic 
enzyme (SCCE) is shown. The two genes widi no homology to human kalUkreins are shown 
as hatched arrows. The eight putative kallikrein-like genes (KLK-Ll to KLK-L8) were 
numbered from the most contromeric to the most telomeric. Numbers just below or just above 
the arrows indicate Kb lengths in each contig. 
DETAILED nFii;CRIPTION OF THE INVENTION 
1, Nucleic Acid Molecules of the Invention 

As hereinbefore mentioned, the invention provides an isolated nucleic acid molecule 
having a sequence encoding KLK-L Protein. The term "isolated" refers to a nucleic acid 
substantially free of cellular material or culture medium when produced by recombinant DNA 
techniques, or chemical reactants, or other chemicals when chemically syntiiesized. An 
"isolated" nucleic acid may also be free of sequences which naturally flank the nucleic acid (i.e.. 
sequences located at the 5' and 3' ends of the nucleic acid molecule) from which the nucleic acid 
is derived. The term "nucleic acid" is intended to include DNA and RNA and can be either 
double stranded or single stranded. In an embodiment, a nucleic acid molecule encodes a KLK- 
L Protein comprising the amino acid sequence as shown in Tables 2 to 9. 
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TTie invention includes nucleic acid sequences complementary to a nucleic acid encoding 
KLK-L Protein comprising the amino acid sequence as shown in Table 2 to 9. 

The invention includes nucleic acid molecules having substantial sequence identity or 
homology to nucleic acid sequences of the invention or encoding proteins having substantial 
identity or similarity to the amino acid sequence shown in Tables 2 to 9. Preferably, the nucleic 
acids have substantial sequence identity for example at least 70% nucleic acid identity; more 
preferably 80% nucleic acid identity; and most preferably at least 89% to 95% sequence identity. 

"Identity" as known in the art and used herein, is a relationship between two or more 
amino acid sequences or two or more nucleic acid sequences, as determined by comparing the 
sequences. It also refers to the degree of sequence relatedness between amino acid or nucleic 
acid sequences, as the case may be, as detennined by the match between strings of such 
sequences. Identity and similarity are well known tenns to skilled artisans and they can be 
calculated by conventional methods (for example see Computational Molecular Biology, Lesk. 
A.M. ed., Oxfonl University Press, New York, 1988; Biocomputing: Informatics and Genome 
Projects, Smith, D.W. ed.. Academic Press, New York, 1993; Computer Analysis of Sequence 
Data, Part I, Griffin, A.M. and Griffin, H.G. eds., Humana Press, New Jersey, 1994; Sequence 
Analysis in Molecular Biology, von Heinje,-G. Acadmeic Press, 1987; and Sequence Analysis 
Primer, Gribskov, M. andDevereux, J. eds. M. Stockton Press, New York, 1991, Carillo, H. and 
Upman, D., SIAM J. AppUed Math. 48:1073, 1988). Methods which are designed to give the 
largest match between the sequences are generally preferred. Methods to determine identity and 
similarity are codified in publicly available computer programs including the GCG program 
package (Devereux J. et al.. Nucleic Acids Research 12(1): 387, 1984); BLASTP, BLASTN, 
and FASTA (Atschul, S.F. et al. J. Molec. Biol. 215: 403-410, 1990). The BLAST X program 
is pubHcly available from NCBI and other sources (BLAST Manual, Altschul, S. et al. NCBI 
NLM NIH Bethesda, Md. 20894; Altschul, S. et al. J. Mol. Biol. 215: 403^10, 1990). 

Isolated nucleic acid molecules encoding a protein having die activity of KLK-L Protein, 
and having a sequence which differs from the nucleic acid sequences of die invention due to 
degeneracy in the genetic code are also within the scope of the invention. Such nucleic acids 
encode functionaUy equivalent proteins (e.g.. a KLK-L Protein) but differ in sequence from the 
sequence of a KLK-L Protein due to degeneracy in the genetic code. As one example, DNA 



-7- 

sequence polymorphisms within the nucleotide sequence of a KLK-L Protein may result in 
silent mutations which do not affect the amino acid sequence. Variations in one or more 
nucleoUdes may exist among individuals within a populaUon due to natural allelic variation. 
Any and all such nucleic acid variations are within the scope of the invention. DNA sequence 
polymorphisms may also occur which lead to changes in the amino acid sequence of a KLK-L 
Protein. These amino acid polymorphisms are also within the scope of the present invention. 

Another aspect of the invention provides a nucleic acid molecule which hybridizes undo- 
stringent conditions, preferably high stringency conditions to a nucleic acid molecule which 
comprises a sequence which encodes a KLK-L Protein having the amino acid sequence shown 
in Tables 2 to 9. Appropriate stringency conditions which promote DNA hybridization are 
known to those skilled in the art, or can be found in Current Protocols in Molecular Biology, 
John Wiley & Sons, N.Y. (1989), 6.3.1-6.3.6. For example, 6.0 x sodium chloride/sodium 
citrate (SSC) at about 45°C, followed by a wash of 2.0 x SSC at 50°C may be employed. The 
stringency may be selected based on the conditions used in the wash step. By way of example, 
the salt concentration in the wash step can be selected from a high stringency of about 0.2 x SSC 
at 50°C. In addition, the temperature in the wash step can be at high stringency conditions, at 
about 65°C. 

It will be appreciated that the invention includes nucleic acid molecules encoding a 
KLK-L Related Protein including truncations of a KLK-L Protein, and analogs of a KLK-L 
Protein as described herein. It will further be appreciated that variant forms of the nucleic acid 
molecules of the invention which arise by alternative splicing of an mRNA corresponding to a 
cDNA of the invention are encompassed by the invention. 

An isolated nucleic acid molecule of the invention which comprises DNA can be 
isolated by preparing a labelled nucleic acid probe based on all or part of a nucleic acid 
sequence of the invention. The labeled nucleic acid probe is used to screen an appropriate DNA 
library (e.g. a cDNA or genomic DNA library). For example, a cDNA library can be used to 
isolate a cDNA encoding a KLK-L Related Protein by screening the library with the labeled 
probe using standard techniques. Alternatively, a genomic DNA library can be similarly 
screened to isolate a genomic clone encompassing a gene encoding a KLK-L Related Protein. 
Nucleic acids isolated by screening of a cDNA or genomic DNA library can be sequenced by 
standard techniques. 
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An isolated nucleic acid molecule of the invention which is DNA can also be isolated 
by selectively amplifying a nucleic acid encoding a KLK-L Related Protein using the 
polymerase chain reaction (PCR) methods and cDNA or genomic DNA. It is possible to design 
synthetic oligonucleotide primers from the nucleotide sequence of the invention for use in PCR. 
A nucleic acid can be amplified from cDNA or genomic DNA using- tfiese oUgonucleotide 
primers and standard PCR amplification techniques. The nucleic acid so amplified can be 
cloned into an appropriate vector and characterized by DNA sequence analysis. cDNA may be 
prepared fi-om mRNA, by isolating total cellular mRNA by a variety of techniques, for example, 
by using the guanidinium-thiocyanate exU-action procedure of Chirgwin et al., Biochemistry, 18, 
5294-5299 (1979). cDNA is then synthesized from the mRNA using reverse transcriptase (for 
example, Moloney MLV reverse transcriptase available from Gibco/BRL, Bethesda, MD. or 
AMV reverse transcriptase available from Seikagaku America, Inc., St. Petersburg, FL). 

An isolated nucleic acid molecule of the invention which is RNA can be isolated by 
cloning a cDNA encoding a KLK-L Related Protein into an appropriate vector which allows 
for transcription of the cDNA to produce an RNA molecule which encodes a KLK-L Related 
Protein. For example, a cDNA can be cloned downstream of a bacteriophage promoter, (e.g. a 
T7 promoter) in a vector. cDNA can be transcribed in vitro with T7 polymerase, and the 
resultant RNA can be isolated by conventional techniques. 

Nucleic acid molecules of the invention may be chemically synthesized using standard 
techniques. Methods of chemically synthesizing polydeoxynucleotides are known, including but 
not limited to solid-phase synthesis which, like peptide synthesis, has been fully automated in 
commercially available DNA synthesizers (See e.g., Itakura et al. U.S. Patent No. 4,598,049; 
Carutheis et al. U.S. Patent No. 4.458,066; and Itakura U.S. Patent Nos. 4,401,796 and 
4,373,071). 

Detemiinadon of whether a particular nucleic acid molecule encodes a KLK-L Related 
Protein can be accomplished by expressing the cDNA in an appropriate host cell by standard 
techniques, and testing die expressed protein in the methods described herein. A cDNA 
encoding a KLK-L Related Protein can be sequenced by standard techniques, such as 
dideoxynucleotide chain termination or Maxam-Gilbert chemical sequencingrto determine die 
nucleic acid sequence and the predicted amino acid sequence of the encoded protein. 

The initiation codon and untranslated sequences of a KLK-L Related Protein may be 
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determined using computer software designed for the purpose, such as PaCene (IntelliGenetics 
Inc., Calif.). The intron-exon structure and the transcription regulatory sequences of a gene 
encoding a KLK-L Related Protein may be identified by using a nucleic acid molecule of the 
invention encoding a KLK-L Related Protein to probe a genomic DNA clone library. 
Regulatory elements can be identified using standard techniques. The function of the elements 
can be confirmed by using these elements to express a reporter gene such as the lacZ gene which 
is operatively linked to the elements. These constructs may be introduced into cultured cells 
using conventional procedures or into non-human transgenic animal models. In addition to 
identifying regulatory elements in DNA, such constructs may also be used to identify nuclear 
proteins interacting with the elements, using techniques known in the art. 

In a particular embodiment of the invention, the nucleic acid molecules isolated using 
the methods described herein are mutant KLK-L gene alleles. The mutant alleles may be 
isolated from individuals either known or proposed to have a genotype which contributes to the 
symptoms of cancer (e.g. breast or prostate cancer). Mutant alleles and mutant allele products 
may be used in therapeutic and diagnostic methods described herein. For example, a cDNA of 
a mutant KLK-L gene maybe isolated using PCR as described herein, and the DNA sequence 
of the mutant allele may be compared to the normal allele to ascertain the mutation(s) 
responsible for the loss or alteration of function of the mutant gene product. A genomic library 
can also be constructed using DNA from an individual suspected of or known to cairy a mutant 
allele, or a cDNA library can be constructed using RNA from tissue known, or suspected to 
express the mutant allele. A nucleic acid encoding a normal KLK-L gene or any suitable 
fragment thereof, may then be labeled and used as a probe to identify the corresponding mutant 
allele in such libraries. Clones containing mutant sequences can be purified and subjected to 
sequence analysis. Li addition, an expression library can be constructed using cDNA from RNA 
isolated from a tissue of an individual known or suspected to express a mutant KLK L allele. 
Gene products made by the putatively mutant tissue may be expressed and screened, for 
example using antibodies specific for a KLK-L Related Protein as described herein. Library 
clones identified using the antibodies can be purified and subjected to sequence analysis. 

The sequence of a nucleic acid molecule of the invention, or a fragment of the molecule, 
may be inverted relative to its normal presentation for transcription to produce an antisense 
nucleic acid molecule. An antisense nucleic acid molecule may be constructed using chemical 
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synthesis and enzymatic ligation reactions using procedures known in the art. 
2. Proteins of the Invention 

An amino acid sequence of KLK-L Protein comprises a sequence as shown in Tables 

2 to 9, 

5 In addition to proteins comprising an amino acid sequence as shown Tables 2 to 9 the 

proteins of the present invention include truncations of a KLK-L Protein, analogs of a KLK-L 
Protein, and proteins having sequence identity or similarity to a KLK-L Protein, and 
truncations thereof as described herein (i.e. KLK-L Related Proteins). Truncated proteins may 
comprise peptides of between 3 and 70 amino acid residues, ranging in size from a tripeptide 
10 to a 70 mer polypeptide. 

The truncated proteins may have an amino group (-NH2), a hydrophobic group (for 
example, carbobenzoxyl, dansyl, or T-butyloxycarbonyl), an acetyl group, a 9- 
nuorenylmethoxy-carbonyl (PMOC) group, or a macromolecule including but not limited to 
lipid-fatty acid conjugates, polyethylene glycol, or carbohydrates at the amino terminal end The 
1 5 truncated proteins may have a carboxyl group, an amido group, a T-butyloxycarbonyl group, or 
a macromolecule including but not limited to lipid-fatty acid conjugates, polyethylene glycol, 
or carbohydrates at the carboxy terminal end-. 

The proteins^of theinvention may also include^analpgs^of a KLK-L Protein, and/or 
truncations thereof as described herein, which may include, but are not limited to a KLK-L 
2 0 Protein, containing one or more amino acid substitutions, insertions, and/or^deletions. Amino 
acid substitutions may be of a conserved or non-conserved nature. CohscFved amino acid 
substitutions involve replacing one or more amino acids of a KLK-L Protein amino acid 
sequence with amino acids of similar charge, size, and/or hydrophobicity characteristics. When 
only conserved substitutions are made the resulting analog is preferably functionally equivalent 
2 5 to a KLK-L Protein. Non-conserved substitutions involve replacing one or more amino acids 
of the KLK-L Protein amino acid sequence with one or more amino acids which possess 
dissimilar charge, size, and/or hydrophobicity characteristics. 

One or more amino acid insertions may be introduced into a KLK-L Protein. Amino acid 
insertions may consist of single amino acid residues or sequential amino acids ranging from 2 
30 to 15 amino acids in length. 

Deletions may consist of the removal of one or more amino acids, or discrete portions 
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from a KLK-L Protein sequence. The deleted anoino acids may or may not be conUguous. The 
lower limit length of the resulting analog with a deletion mutation is about 10 amino acids, 
preferably 1 00 amino acids. 

The proteins of the invention include proteins with sequence identity or similarity to a 
KLK-L Protein and/or truncations thereof as described herein. Such KLK-L Proteins include 
proteins whose amino acid sequences are comprised of the amino acid sequences of KLK-L 
Protein regions from other species that hybridize under selected hybridization conditions (see 
discussion of stringent hybridization conditions herein) with a probe used to obtain a KLK-L 
Protein. These proteins will generally have the same regions which are characteristic of a KLK- 
L Protein. Preferably a protein will have substantial sequence identity for example, about 50% 
identity, preferably 70 to 80% identity, more preferably at least 90% to 95% identity, and most 
preferably 98% identity widi the amino acid sequence shown in Tables 2 to 9. 

A percent amino acid sequence homology, similarity or identity is calculated as the 
percentage of aligned amino acids that match the reference sequence using known methods as 
described herein. 

The invention also contemplates isoforms of the proteins of the invention. An isofonn 
contains the same number and kinds of amino acids as the protein of the invention, but the 
isoform has a different molecular stnicture. The isoforms contemplated by the present invention 
preferably have the same properties as the protein of the invention as described herein. 

The present invention also includes KLK-L Related Proteins conjugated with a selected 
protein, or a marker protein (see below) to produce fusion proteins. Additionally, immunogenic 
portions of a KLK-L Protein and a KLK-L Protein Related Protein are within the scope of the 
invention. 

A KLK-L Related Protein of the invention may be prepared using recombinant DNA 
methods. Accordingly, the nucleic acid molecules of the present invention having a sequence 
which encodes a KLK-L Related Protein of the invention may be incorporated in a known 
manner into an appropriate expression vector which ensures good expression of the protein. 
Possible expression vectors include but are not hmited to cosmids, plasmids, or modified 
viruses (e.g. replication defective retroviruses, adenoviruses and adeno-associated viruses), so 
long as the vector is compatible with the host cell used. 

The invention therefore contemplates a recombinant expression vector of the invention 
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containing a nucleic acid molecule of the invention, and the necessary regulatory sequences for 
the transcription and translation of the inserted protein-sequence. Suitable regulatory sequences 
may be derived from a variety of sources, including bacterial, fungal, viral, mammalian, or 
insect genes (For example, see the regulatory sequences described in Goeddel. G^e Expression 
5 Technology: Methods in Enzymology 185, Academic Press, San Diego, CA (1990). Selection 
of appropriate regulatory sequences is dependent on the host cell chosen as discussed below, and 
may be readily accomplished by one of ordinary skill in the art. The necessary regulatory 
sequences may be supplied by the native KLK-L Protein and/or its flanking regions. 

The invention further provides a recombinant expression vector comprising a DNA 
10 nucleic acid molecule of the invention cloned into the expression vector in an antisense 
orientation. That is, the DNA molecule is linked to a regulatory sequence in a manner which 
allows for expression, by transcription of the DNA molecule, of an RNA molecule which is 
antisense to the nucleic acid sequence of a protein of the invention or a fragment thereof. 
Regulatory sequences linked to the antisense nucleic acid can be chosen which direct the 
1 5 continuous expression of the antisense RNA molecule in a variety of cell types, for instance a 
viral promoter and/or enhancer, or regulatory sequences can be chosen which direct tissue or 
cell type specific expression of antisense RNA. 

The recombinant expression vectors of the invention may also contain a marker gene 
which facilitates the selection of host cells transformed or transfected. with^^a recombinant 
molecule of the invention. Examples of marker genes are genes encoding a protein such as G418 
and hygromycin which confer resistance to certain drugs, p-galactosidase, chloramphenicol 
acetyltransferase, firefly luciferase, or an inmiunoglobulin or portion thereof such as the Fc 
portion of an immunoglobulin preferably IgG. The markers can be introduced on a separate 
vector from the nucleic acid of interest. 
25 The recombinant expression vectors may also contain genes which encode a fusion 

moiety which provides increased expression of the recombinant protein; increased solubility of 
the recombinant protein; and aid in the purification of the target recombinant protein by acting 
as a ligand in affinity purification. For example, a proteolytic cleavage site may be added to the 
target recombinant protein to allow separaUon of the recombinant protein from the fusion 
30 moiety subsequent to purification of the fusion protein. Typical fusion expression vectors 
include pGEX (Amrad Corp., Melbourne, Australia), pMAL (New England Biolabs. Beveriy, 
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MA) and pRIT5 (Pharmacia, Piscataway, NJ) which fuse glutathione S-transferase (GST), 
maltose E binding protein, or protein A, respectively, to the recombinant protein. 

The recombinant expression vectors may be introduced into host cells to produce a 
transfomiant host cell. "Transformant host cells" include host ceUs which have been transformed 
or transfected with a recombinant expression vector of the invention. The terms "transformed 
with", "transfected with", "transformation" and "transfection" encompass the introduction of a 
nucleic acid (e.g. a vector) into a cell by one of many standard techniques. Prokaryotic cells can 
be transfonned with a nucleic acid by, for example, electroporation or calcium-chloride 
mediated transformation. Nucleic acid can be inuoduced into mammalian cells via conventional 
techniques such as calcium phosphate or calcium chloride co-precipitation, DEAE-dextran- 
mediated transfection, lipofectin, electroporation or microinjection. Suitable methods for 
transforming and transfecting host cells can be found in Sambrook et al. (Molecular Cloning: 
A Laboratory Manual, 2nd Edition, Cold Spring Haibor Laboratory press (1989)), and other 
laboratory textbooks. 

Suitable host cells include a wide variety of prokaryotic and eukaryotic host cells. For 
example, the proteins of the invention may be expressed in bacterial ceUs such as E. coli, insect 
cells (using baculovirus), yeast cells or mammahan cells. Other suitable host cells can be found 
in Goeddel, Gene Expression Technology: Methods in Enzymology 185, Academic Press. San 
Diego. CA (1991). 

A host cell may also be chosen which modulates the expression of an inserted nucleic 
acid sequence, or modifies (e.g. glycosylation or phosphorylation) and processes (e.g. cleaves) 
the protein in a desired fashion. Host systems or cell lines may be selected which have specific 
and characteristic mechanisms for post-translational processing and modification of proteins. 
For example, eukaryotic host cells including CHO. VERO, BHK, HeLA, COS. MDCK. 293, 
3T3. and WD8 may be used. For long-term high-yield stable expression of the protein, cell lines 
and host systems which stably express the gene product may be engineered. 

Host cells and in particular cell lines produced using the methods described herein may 
be particularly useful in screening and evaluating compounds that modulate the activity of a 
KLK-L Related Protein. 

The proteins of the invention may also be expressed in non-human transgenic animals 
including but not limited to mice, rats, rabbits, guinea pigs, micro-pigs, goats, sheep, pigs, non- 
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human primates (e.g. baboons, monkeys, and chimpanzees) [see Hammer et al. (Nature 
315:680-683, 1985), Palmiter et al. (Science 222:809-814, 1983), Brinster et al. (Proc Natl. 
Acad. Sci USA 82:44384442, 1985), Palmiter and Brinster (Cell. 41:343-345, 1985) and U.S. 
Patent No. 4,736,866)]. Procedures known in the art may be used to introduce a nucleic acid 
5 molecule of the invention encoding a KLK-L Related- Protein into animals to produce the 
founder lines of transgenic animals. Such procedures include pronuclear microinjection, 
retrovirus mediated gene transfer into germ lines, gene targeting in embryonic stem cells, 
electroporation of embryos, and sperm-mediated gene transfer. 

The present invention contemplates a transgenic animal that carries the KLK-L gene in 

10 all their cells, and animals which carry the transgene in some but not all their cells. The 
transgene may be integrated as a single transgene or in concatamers. The transgene may be 
selectively introduced into and activated in specific cell types (See for example, Lasko et al, 
1992 Proc. Nad. Acad. Sci. USA 89: 6236). The transgene may be integrated into the 
chromosomal site of the endogenous gene by gene targeting. The transgene may be selectively 

15 introduced into a particular cell type inactivating the endogenous gene in that cell type (See Gu 
et al Science 265: 103-106). 

The expression of a recombinant KLX-L Related Protein in a transgenic animal may be 
assayed using standard techniques. Initial screening may be conducted by Southern Blot 
analysis, or PGR methods to analyze whether the transgene has been integrated. The level of 

2 0 mRNA expression in the tissues of transgenic animals may also be assessed using techniques 
including Northern blot analysis of tissue samples, in situ hybridization, and RT-PCR. Tissue 
may also be evaluated immunocytochemically using antibodies against KLK-L Protein. 

Proteins of the invention may also be prepared by chemical synthesis using techniques 
well known in the chemistry of proteins such as solid phase synthesis (Merrifield, 1964, J. Am. 

2 5 Chem. Assoc. 85:2149-2154) or synthesis in homogenous solution (Houbenweyl, 1987, 

Methods of Organic Chemistry, ed. E. Wansch, Vol. 15 I and EE, Thieme, Stuttgart). 

N-terminal or C-terminal fusion proteins comprising a KLK-L Related Protein of the 
invention conjugated with other molecules, such as proteins, may be prepared by fusing, through 
recombinant techniques, the N-terminal or G-terminal of a KLK-L Related* Protein, and the 

3 0 sequence of a selected protein or marker protein with a desired biological function. The resultant 

fusion proteins contain KLK-L Protein fused to the selected protein or marker protein as 
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described herein. Examples of proteins which may be used to prepare fusion proteins include 
immunoglobulins, glutathione-S-transferase (GST), hemagglutinin (HA), and truncated myc. 
3. Antibodies 

KLK-L Related Proteins of the invention can be used to prepare antibodies specific for 
the proteins. Antibodies can be prepared which bind a distinct epitope in an unconsented region 
of the protein. An unconserved region of the protein is one that does not have substantial 
sequence homology to other proteins. A region from a conserved region such as a well- 
characterized domain can also be used to prepare an antibody to a conserved region of a KLX-L 
Related Protein. Antibodies having specificity for a KLK-L Related Protein may also be raised 
from fusion proteins created by expressing fusion proteins in bacteria as described herein. 

The invention can employ intact monoclonal or polyclonal antibodies, and 
immunologically active fragments (e.g. a Fab. (Fab)2 fragment, or Fab expression library 
fragments and epitope-binding fragments thereof), an antibody heavy chain, and antibody light 
chain, a genetically engineered single chain Fv molecule (Ladner et al, U.S. Pat. No. 4,946,778), 
or a chimeric antibody, for example, an antibody which contains the binding specificity of a 
murine antibody, but in which the remaining portions are of human origin. Antibodies including 
monoclonal and polyclonal antibodies, fragments and chimeras, may be prepared using methods 
known to those skilled in the art. 

4. Annlications nf the Nuclei^ Acid Mole c lt^s. KI^K-L Related Proteins, and 
Antibodie.s of the Invention 

The nucleic acid molecules, KLK-L Related Proteins, and antibodies of the invention 
may be used in the prognostic and diagnostic evaluation of cancer (e.g. breast and prostate 
cancer), and the identification of subjects with a predisposition to cancer (Section 4.1.1 and 
4.1.2). Methods for detecting nucleic acid molecules and KLK-L Related Proteins of the 
invention, can be used to monitor cancer by detecting KLK-L Related Proteins and nucleic add 
molecules encoding KLK-L Related Proteins. It would also be apparent to one skilled in the art 
that the methods described herein may be used to smdy the developmental expression of KLK-L 
Related Proteins and, accordingly, will provide further insight into the role of KLK-L Related 
Proteins. The applications of the present invention also include methods for the identification 
of compounds that modulate the biological activity of KLK-L or KLK-L Related Proteins 
(Section 4.2). The compounds, antibodies etc. may be used for the treatinent of cancer (Section 
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4.3). 

4,1 Diagnostic Methods 

A variety of methods can be employed for the diagnostic and prognostic evaluation of 
cancer (e.g. breast and prostate cancer), and the identification of subjects with a predisposition 
to cancer. Such methods may, for example, utilize nucleic acid molecules of the invention, and 
fragments thereof, and antibodies directed against KLK-L Related Proteins, including peptide 
fragments. In particular, the nucleic acids and antibodies may be used, for example, for: (1) the 
detection of the presence of KLK-L mutations, or the detection of either over- or under- 
expression of KLK-L mRNA relative to a non-disorder state or the qualitative or quantitative 
detection of alternatively spliced forms of KLK-L transcripts which may correlate with certain 
conditions or susceptibility toward such conditions; and (2) the detection of either an over- or 
an under-abundance of KLK-L Related Proteins relative to a non- disorder state or the presence 
of a modified (e.g., less than full length) KLK-L Protein which correlates with a disorder state, 
or a progression toward a disorder state. 

The methods described herein may be performed by utilizing pre-packaged diagnostic 
kits comprising at least one specific KLK L nucleic acid or antibody described herein, which 
may be conveniently used, e.g., in clinical settings, to screen and diagnose patients and to screen 
and identify those individuals exhibiting a predisposition to developing a disorder. 

Nucleic acid-based detection techniques are described, below, in Section 4.1.1. Peptide 
detection techniques are described, below, in Section 4.1.2. The samples that may be analyzed 
using the methods of the invention include those which are known or suspected to express KLK- 
L or contain KLK-L Related Proteins. The samples may be derived from a patient or a cell 
culture, and include but are not limited to biological fluids, tissue extracts, freshly harvested 
cells, and lysates of cells which have been incubated in cell cultures. 
4.1,1 Methods for Detecting Nucleic Acid Molecules of th e Invention 

The nucleic acid molecules of the invention allow those skilled in the art to construct 
nucleotide probes for use in the detection of nucleic acid sequences of the invention in samples. 
Suitable probes include nucleic acid molecules based on nucleic acid sequences encoding at 
least 5 sequential amino acids from regions of the KLK-L Protein, preferably they comprise 
15 to 30 nucleotides, A nucleotide probe may be labeled with a detectable substance such as a 
radioactive label which provides for an adequate signal and has sufficient half-life such as ^^P, 



^H, ^"^C or the like. Other detectable substances which may be used include antigens that are 
recognized by a specific labeled antibody, fluorescent compounds, enzymes, antibodies specific 
for a labeled antigen, and luminescent compounds. An appropriate label may be selected having 
regard to the rate of hybridization and binding of the probe lo the nucleotide to be detected and 
the amount of nucleotide available for hybridization. Labeled probes may be hybridized to 
nucleic acids on soUd supports such as nitroceUulose filters or nylon membranes as generally 
described in Sambrook et al. 1989, Molecular Cloning, A Laboratory Manual (2nd ed.). The 
nucleic acid probes may be used to detect genes, preferably in human cells, that encode KLK-L 
Related Proteins. The nucleotide probes may also be useful in the diagnosis of cancer; in 
monitoring the progression of cancer; or monitoring a therapeutic treatment. 

The probe may be used in hybridization techniques to detect genes that encode KLK-L 
Related Proteins. The technique generally involves contacting and incubating nucleic acids (e.g. 
recombinant DNA molecules, cloned genes) obtained from a sample from a patient or other 
cellular source with a probe of the present invention under conditions favorable for the specific 
annealing of the probes to complementary sequences in the nucleic acids. After incubation, the 
non-annealed nucleic acids are removed, and the presence of nucleic acids that have hybridized 
to the probe if any are detected. 

The detection of nucleic acid molecules of the invention may involve the amplification 
of specific gene sequences using an amplification method such as PGR, followed by the analysis 
of the amplified molecules using techniques known to those skilled in the art. Suitable primers 
can be routinely designed by one of skill in the art. 

Genomic DNA may be used in hybridization or amplification assays of biological 
samples to detect abnormalities involving klk- I structure, including point mutations, insertions, 
deletions, and chromosomal rearrangements. For example, direct sequencing, single stranded 
conformational polymorphism analyses, heteroduplex analysis, denaturing gradient gel 
electrophoresis, chemical mismatch cleavage, and oUgonucleotide hybridization may be utilized. 

Genotyping techniques known to one skilled in the art can be used to type 
polymorphisms that are in close proximity to the mutations in a klk-l gene. The polymorphisms 
may be used to identify individuals in families that are likely to carry mutations. If a 
polymorphism exhibits linkage disequaUbrium with mutations in a klk4 gene, it can also be used 
to screen for individuals in the general population likely to carry mutations. Polymorphisms 




which may be used include restriction fragment length polymorphisms (RFLPs), single-base 
polymorphisms, and simple sequence repeat polymorphisms (SSUPs). 

A probe of the invention may be used to directly identify RFLPs. A probe or primer of 
the invention can additionally be used to isolate genomic clones such as YACs, B ACs, PACs, 
5 cosmids, phage or plasmids. The DNA in the clones can be screened- for SSLPs using 
hybridization or sequencing procedures. 

Hybridization and amplification techniques described herein may be used to assay 
qualitative and quantitative aspects of klk-l expression. For example, RNA may be isolated from 
a cell type or tissue known to express klk-l and tested utilizing the hybridization (e.g. standard 

1 0 Northern analyses) or PGR techniques referred to herein. The techniques may be used to detect 
differences in transcript size which may be due to normal or abnormal alternative splicing. The 
techniques may be used to detect quantitative differences between levels of full length and/or 
alternatively splice transcripts detected in normal individuals relative to those individuals 
exhibiting cancer symptoms or other disease conditions. 

15 The primers and probes may be used in the above described methods in situ i.e directly 

on tissue sections (fixed and/or frozen) of patient tissue obtained from biopsies or resections. 
4.1.2 M ethods^for Detecting KLK-L Related Proteins 

Antibodies specifically reactive with a KLK-L Related Protein, or derivatives, such as 
enzyme conjugates or labeled derivatives, may be used to detect KLK-L Related Proteins in 

2 0 various samples (e.g. biological materials). They may be used as diagnostic or prognostic 
reagents and they may be used to detect abnormalities in the level of KLK-L Related Proteins 
expression, or abnormalities in the structure, and/or temporal, tissue, cellular, or subcellular 
location of a KLK-L Related Protein, Antibodies may also be used to screen potentially 
therapeutic compounds in vitro to determine their effects on cancer, and other conditions. In 

2 5 vitro immunoassays may also be used to assess or monitor the efficacy of particular therapies. 

The antibodies of the invention may also be used in vitro to detemoine the level of KLK-L 
expression in cells genetically engineered to produce a KLK-L Related Protein. 

The antibodies may be used in any known immunoassays which rely on the binding 
interaction between an antigenic determinant of a KLK-L Related Protein and the antibodies. 

3 0 Examples of such assays are radioimmunoassays, enzyme immunoassays (e.g. EUSA), 

immunofluorescence, immunoprecipitation, latex agglutination, hemagglutination, and 
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histochemical tests. The antibodies may be used to detect and quantify KLX-L Related Proteins 
in a sample in order to determine its role in particular cellular events or pathological states, and 
to diagnose and treat such pathological states. 

In particular, the antibodies of the invention may be used in immuno-histochemical 
5 analyses, for example, at the cellular and sub-subcellular level, to detect a KLK-L Related 
Protein, to localize it to particular cells and tissues, and to specific subcellular locations, and to 
quantitate the level of expression. 

Cytochemical techniques known in the art for localizing antigens using light and 
electron microscopy may be used to detect a ICLK-L Related Protein. Generally, an antibody 
10 of the invention may be labeled with a detectable substance and a KLK-L Related Protein may 
be localised in tissues and cells based upon the presence of the detectable substance. Examples 
of detectable substances include, but are not limited to, the following: radioisotopes (e.g., ^ H, 
2 ^"^C, ^^S, ^^^I), fluorescent labels (e.g., FTTC. ibodamine, lanthanide phosphors), luminescent 

'rf labels such as luminol; enzymatic labels (e.g.. horseradish peroxidase, beta-galactosidase, 

Ip 15 luciferase, alkaline phosphatase, acetylcholinesterase), biotinyl groups (which can be detected 
by marked avidin e.g., streptavidin containing a fluorescent marker or enzymatic activity that 
S can be detected by optical or calorimetric methods), predetermined polypeptide epitopes 

m 

Q recognized by a secondary reporter (e.g., leucine zipper pair sequences, binding sites for 

* c ; 

^ secondary antibodies, metal binding domains, epitope tags). In some embodiments, labels are 

^ 20 attached via spacer arms of various lengths to reduce potential steric hindrance. Antibodies may 
S also be coupled to electron dense substances, such as ferritin or colloidal gold, which are readily 

visualised by electron microscopy. 

The antibody or sample may be inunobilized on a carrier or solid support which is 
capable of immobilizing cells, antibodies etc. For example, the carrier or support may be 
25 nitrocellulose, or glass, polyacrylamides, gabbros, and magnetite. The support material may 
have any possible configuration including spherical (e.g. bead), cylindrical (e.g. inside surface 
of a test tube or well, or the external surface of a rod), or flat (e.g. sheet, test strip). Indirect 
methods may also be employed in which the primary antigen-antibody reaction is amplified by 
the introduction of a second antibody, having specificity for the antibody reactive against KLK- 
30 L Related Protein. By way of example, if the antibody having specificity against a KLK-L 
Related Protein is a rabbit IgG antibody, the second antibody may be goat anti-rabbit gamma- 
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globulin labeled with a detectable substance as described herein. 

Where a radioactive label is used as a detectable substance, a KLK-L Related Protein 
may be localized by radioautography. The results of radioautography may be quantitated by 
determining the density of particles in the radioautographs by various optical methods, or by 
5 counting the grains. 

4.2 Methods for Identifying or Evaluating.Substances/Compounds t 

The methods described herein are designed to identify substances that modulate the 
biological activity of a KLX-L Related Protein including substances that«bind to KLK-L 
Related Proteins, or bind to other proteins that interact with a KLK-L Related Protein, to 
1 0 compounds that interfere with, or enhance the interaction of a KLK-L Related Protein and 
substances that bind to the KLK-L Related Protein or other proteins that interact with a KLK-L 
Related Protein. Methods are also utilized that identify compounds that bind to KLK-L 
^; regulatory sequences. 

N= The substances and compounds identified using the methods of the invention include 

lU 

£ 15 but are not limited to peptides such as soluble peptides including Ig-tailed fusion peptides, 
members of random peptide libraries and combinatorial chemistry-derived molecular libraries 
Q made of D- and/or L-configuration amino acids, phosphopeptides (including members of 

O random or partially degenerate, directed phosphopeptide^libraries), antibodies [e.g. polyclonal, 

monoclonal, humanized, anti-idiotypic, chimeric, single^ehain antibodies^fraginents, (e.g. Fab, 
2 0 F(ab)2, and Fab expression library fragments, and epitope-binding fragments thereof)], and small 
organic or inorganic molecules. The substance or compound may be* an endogenous 
physiological compound or it may be a natural or synthetic compound. 

Substances which modulate a KLK-L Related Protein can be identified based on their 
ability to bind to a KLK-L Related Protein. Therefore, die invention also provides methods for 

2 5 identifying substances which bind to a KLK-L Related Protein. Substances identified using the 
methods of the invention may be isolated, cloned and sequenced using conventional techniques. 

Substances which can bind with a KLK-L Related Protein may be identified by reacting 
a KLK-L Related Protein with a test substance which potentially binds to a KLK-L Related 
Protein, under conditions which permit the formation of substance-KLK-L Related Protein 

3 0 complexes and removing and/or detecting the complexes. The complexes can be detected by 
assaying for substance-KLK-L Related Protein complexes^ for free -substance, or for non- 
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complexed KLK-L Related Protein. Conditions which perniit the formation of substance-KLK- 
L Related Protein complexes may be selected having regard to factors such as the nature and 
amounts of the substance and the protein. 

The substance-protein complex, free substance or non-complexed proteins may be 
5 isolated by conventional isolation techniques, for example, salting out, chromatography, 
electrophoresis, gel filtration, fractionation, absorption, polyacrylamide gel electrophoresis, 
agglutination, or combinations thereof. To facilitate the assay of the components, antibody 
against KLK-L Related Protein or the substance, or labeled KLK-L Related Protein, or a 
labeled substance may be utilized. The antibodies, proteins, or substances may be labeled with 

10 a detectable substance as described above. 

A KLK-L Related Protein, or the substance used in the method of the invention may be 
insolubilized. For example, a KLK-L Related Protein, or substance may be bound to a suitable 
carrier such as agarose, cellulose, dextran, Sephadex, Sepharose, carboxymethyl cellulose 
polystyrene, filter paper, ion-exchange resin, plastic film, plastic tube, glass beads, polyamine- 

15 methyl vinyl-ether-maleic acid copolymer, amino acid copolymer, ethylene-maleic acid 
copolymer, nylon, silk, etc. The carrier may be in the shape of, for example, a tube, test plate, 
beads, disc, sphere etc. The insolubilized protein or substance may be prepared by reacting the 
material with a suitable insoluble carrier using known chemical or physical methods, for 
example, cyanogen bromide coupling, 

2 0 The invention also contemplates a method for evaluating a compound for its ability to 

modulate the biological activity of a KLK-L Related Protein of the invention, by assaying for 
an agonist or antagonist (i.e. enhancer or inhibitor) of the binding of a KLK-L Related Protein 
with a substance which binds with a KLK-L Related Protein. The basic method for evaluating 
if a compound is an agonist or antagonist of the binding of a KLK-L Related Protein and a 

25 substance that binds to the protein, is to prepare a reaction mixture containing the KLK-L 
Related Protein and the substance under conditions which permit the formation of substance- 
KLK-L Related Protein complexes, in the presence of a test compound. The test compound may 
be initially added to the mixture, or may be added subsequent to the addition of the KLK-L 
Related Protein and substance. Control reaction mixtures without the test compound or with a 

30 placebo are also prepared. The formation of complexes is detected and the formation of 
complexes in the control reaction but not in the reaction mixture indicates that the test 
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compound interferes with the interaction of the KLK-L Related Protein and substance. The 
reactions may be carried out in the liquid phase or the KLK-L Related Protein, substance, or 
test compound may be immobilized as described herein. The ability of a compound to modulate 
the biological activity of a KLK-L Related Protein of the invention may be tested by 
determining the biological effects on cells. 

It v/ill be understood that the agonists and antagonists i.e. inhibitors and enhancers that 
can be assayed using the methods of the invention may act on one or more of die binding sites 
on the protein or substance including agonist binding sites, competitive antagonist binding sites, 
non-competitive antagonist binding sites or allosteric sites. 

The invention also makes it possible to screen for antagonists that inhibit the effects of 
an agonist of the interaction of KLK-L Related Protein with a substance which is capable of 
binding to the KLK-L Related Protein. Thus, the invention may be used to assay for a 
compound that competes for the same binding site of a KLK-L Related Protein. 

The invention also contemplates methods for identifying compounds that bind to 
proteins that interact with a KLK-L Related Protein. Protein-protein interactions may be 
identified using conventional methods such as co-immunoprecipitation, crosslinking and co- 
purification through gradients or chromatographic columns. Methods may also be employed that 
result in the simultaneous identification of genes-which encode proteins interacting with a KLK- 
L Related Protein. These methods include probing expression libraries-with labeled KLK-L 
Related Protein. 

Two-hybrid systems may also be used to detect protein interactions in vivo. Generally, 
plasmids are constructed that encode two hybrid proteins. A first hybrid protein consists of the 
DNA-binding domain of a transcription activator protein fused to a KLK-L Related Protein, and 
the second hybrid protein consists of the transcription activator protein's activator domain fused 
to an unknown protein encoded by a cDNA which has been recombined into the plasmid as part 
of a cDNA Ubrary. The plasmids are transformed into a strain of yeast (e.g. .S:. cerevisiae) that 
contains a reporter gene (e.g. lacZ, luciferase, alkaline phosphatase, horseradish peroxidase) 
whose regulatory region contains the transcription activator's binding site. The hybrid proteins 
alone cannot activate the transcription of the reporter gene. However, interaction of the two 
hybrid proteins reconstitutes the functional activator protein and results in expression of the 
reporter gene, which is detected by an assay for the reporter gene product. 
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It will be appreciated that fusion proteins may be used in the above-described methods. 
In particular, KLK-L Related Proteins fused to a glutathione-S-transferase may be used in the 
methods. 

The reagents suitable for applying the methods of the invention to evaluate compounds 
that modulate a KLK-L Related Protein may be packaged into convenient kits providing the 
necessary materials packaged into suitable containers. The kits may also include suitable 
supports useful in performing the methods of the invention. 
4.3 Compositions and Treatments 

The substances or compounds identified by the methods described herein, antibodies, 
and antisense nucleic acid molecules of the invention, and peptides may be used for modulating 
the biological activity of a KLK-L Related Protein, and they may be used in the treatment of 
conditions such as cancer (e.g. prostate or breast cancer). Accordingly, the substances, 
antibodies, peptides, and compounds may be formulated into pharmaceutical compositions for 
administration to subjects in a biologically compatible form suitable for administration in vivo. 
By "biologically compatible form suitable for administration in vivo" is meant a form of the 
active substance to be administered in which any toxic effects are outweighed by the therapeutic 
effects. The active substances may be administered to living organisms including humans, and 
animals. Administration of a therapeutically active amount of a pharmaceutical composition of 
the present invention is defined as an amount effective, at dosages and for periods of time 
necessary to achieve the desired result. For example, a therapeutically active amount of a 
substance may vary according to factors such as the disease state, age, sex, and weight of tiie 
individual, and the ability of antibody to elicit a desired response in the individual. Dosage 
regima may be adjusted to provide the optimum therapeutic response. For example, several 
divided doses may be administered daUy or the dose may be proportionally reduced as indicated 
by the exigencies of the therapeutic situation. 

The active substance may be administered in a convenient manner such as by injection 
(subcutaneous, intravenous, etc.), oral administiration, inhalation, ti^sdermal application, or 
rectal administration. Depending on the route of administi-ation, the active substance may be 
coated in a material to protect the substance from the action of enzymes, acids and other natinal 
conditions that may inactivate the substance. 

The compositions described herein can be prepared by eerse known methods for the 
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preparation of phannaceutically acceptable compositions which can be administered to subjects, 
such that an effective quantity of the active substance is combined in a mixture with a 
pharmaceutically acceptable vehicle. Suitable vehicles arc described, for example, in 
Remington's Phamiaceutical Sciences (Remington's Pharmaceutical Sciences. Mack Publishing 
Company, Easton, Pa , USA 1985). On this--basis, the compositions include, albeit not 
exclusively, solutions of the active substances in association with one or more phannaceutically 
acceptable vehicles or diluents, and contained in buffered solutions with a suitable pH and iso- 
osmotic with the physiological fluids. 

The activity of the substances, compounds, antibodies, antisense nucleic acid molecules, 
and compositions of the invention may be confirmed in animal experimental model systems. 

The following non-limiting example is illusti^tive of the present invention: 
Example 

MATERIALS AND METHODS 

Identification of positive PAC and BAC genomic clones from a human genomic DNA 
library-" 

The sequence of PSA. KLKl, KLK2, NESl and Zyme genes is already known. 
Polymerase chain reaction (PCR)-based amplification protocols have been developed which 
allowed generation, of PGR products specific for each one of these genes. Using tiiese PGR 
products as probes, labeled with a human genomic DNA PAC library and a human genomic 
DNA BAC library was screened for the purpose of identifying positive clones of approximately 
100-150 Kb long. The general strategies for these experiments have been published elsewhere 
(14). The genomic Ubraries were spotted in duplicate on nylon membranes and positive clones 
were furtiier confirmed by Southern blot analysis as described (14). 
DNA sequences on chromosome 19 

The Lawrence Livennore National Laboratory participates in the sequencing of the 
human genome project and focuses on sequencing chromosome 19. Large sequencing 
information on tiiis chromosome is available at the website of the Lawrence Livermore National 
T ahnratnry rhttp:y/www -bio.llnl.gov/genome/genmome.html). 

Approximately 300 Kb of genomic sequences was obtained- from, that- website, 
encompassing a region on chromosome 19ql3.3 - 13.4, where the known kallikrein genes are 
localized. This 300 Kb of sequence is represented by 9 contigs of variable«.lengths.. By using 
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a number of different computer programs, an almost contiguous sequence of the region was 
established as shown diagramatically in Figure 1. Some of the contigs were reversed as shown 
in Figure 1 in order to reconstruct the area on both strands of DNA. 

By using the published sequences of PSA, KLK2, NESl and protease M and the 
computer software BLAST 2. using alignment strategies, the relative positions of these genes 
on the contiguous map were identified (Figure 1). These known genes served as hallmarks for 
further studies. An EcoRl restriction map of the area is also available at the website of the 
Lawrence Livermore National Laboratory. Using this restriction map and the computer program 
WebCutter nittp://www .fiistmarket.com/cutter/cut2.html), a restriction study analysis of the 
available sequence was perfonned to further confirm the assignment and relative posiUons of 
these contigs along chromosome 19. The obtained configuration is presented in Figure 1. 

Gene prediction analysis 

For exon prediction analysis of the whole genomic area, a number of different computer 
programs were used. These programs are listed in Table 1. All these programs were initially 
tested using known genomic sequences of the PSA, protease M and NESl genes. The more 
reliable computer programs, GeneBuilder (gene prediction), GeneBuilder (exon prediction), 
Grail 2 and GENEID-3 were selected for further use. 
Protein homology searching 

Putative exons of the new genes were first translated to the corresponding aminoacid 
sequences. BLAST homology searching for the proteins encoded by the exons of the putative 
new genes were perfonned using the BLASTP program and the Genbank databases. 
RESULTS 

Relative position of PSA, KLK2, Zyme and NESl on Chromosome 19 

Screening of the human BAC library identified two clones which were positive for die 
Zyme gene (clones BAC 288H1 and BAC 76F7). These BACs were further analyzed by PGR 
and primers specific for PSA. NESl, KLKl and KLK2. These analyses indicated that both 
BACs were positive for Zyme, PSA and KLK2 and negative for KLKl and NESl genes. 

Screening of the human PAC genomic library identified a PAC clone which was 
positive for NES 1 (clone PAC 34B1). Further PGR analysis indicated that this PAC clone was 
positive for NESl and KLKl genes and negative for PSA, KLK2 and Zyme. Combination of 
this information with the EcoRl restriction map of the region allowed estabUshment of the 
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relative positions of these four genes. PSA is the most centromeric, followed by KLK2, Zyme 
and NESl. Further alignment of the known sequences of these genes with the 300 Kb contig 
enabled precise localization of aU four genes and detennination of the direction of transcription, 
as shown by the airows in Figure 1. The KLKl gene sequence was not identified on this contig 
5 and appears to be further telomeric to NES 1 . 
Identification of new genes 

A set of rules was used to consider presence of a new gene in the genomic area of 

interest as follows: 

1. Clusters of at least 3 exons should be found. 
10 2. Only exons with high prediction score ("good" or "exceUent" quality, as indicated by the 

searching programs) were considered for the construction of the putative new genes. 
3. Exons predicted were reliable only if they were identified by at least two different exon 

prediction programs. 

U By using this strategy, eleven putative new genes were identifed of which one was found 

I 15 on subsequent^ homology analysis to be a known gene, the human stratum comeum 
S chymotrypsin enzyme (HSCCE). Two other putative genes (gene UG.1 and gene UG-2)weie 

a identified which show no homology, at the protein level, with the kallikrein proteins. The eight 

U remaining genes all have variable homologies witix known human or animal-kallikrein proteins 

Ui and/or other known serine proteases. 

C 2 0 In Tables 2 to 11, the preliminary exon structure and encodedprotein for each one of 

the ten newly identified genes is shown. In Table 12, some proteins are presented which 
appear, on preliminary analysis, to be homologous to tiie proteins encoded by the putative new 
genes. 

DISCUSSION 

2 5 Prediction of protein-coding genes in newly sequenced DNA becomes very important 
after the establishment of large genome sequencing projects. This problem is complicated due 
to the exon-intron structure of the eukaryotic genes which interrupts tiie coding sequence in 
many unequal parts. In order to predict the protein-coding exons and overall gene structure, a 
number of computer programs were developed. -All tiiese programs are based on tiie 

3 0 combination of potential functional signals with the global statistical properties of known 
protein-coding regions (15). However, the most powerful approach for gene structure prediction 




is to combine information about potential functional signals (splice sites, translation start or stop 
signal etc.) together with the statistical properties of coding sequences (coding potential) along 
with information about homologies between the predicted protein and known protein families 
(16). 

In mouse and rat, kalHkreins are encoded by large multigene families and these genes 
tend to cluster in groups with a distance as smaU as 3.3 - 7.0 Kb (3). A strong conservation of 
gene order between human chromosome 19ql3.1 - ql3.4 and 17 loci in a 20-cM proximal part 
of mouse chromosome 7, including the kallikrein locus, has been documented (17). 

In humans, only a few kallikrein genes were identified. In fact, only KLXl, KLK2 and 
KLK3 (PSA) are considered to represent the human kallikrein gene family (9). In this paper, we 
provide strong evidence that a large number of kallikrein-like genes are clustered within a 
300Kb region around chromosome 19qI3.2 - ql3.4. Except of the three established human 
kallikreins (KLKl, KLK2, KLK3), Zyme and NESl, as well as the stratum comeum 
chymotrypticn enzyme and another eight new genes , KLK-Ll to KLK-L8, may constitute a 
large gene family . This will bring the total number of kallikrein or kallikrein-like genes in 
humans to fourteen. 

Kallikrein genes are a subfamily of serine proteases, traditionally characterized by their 
ability to liberate lysyl-bradykinin (kallidin) from kininogen (18). More recently, however, a 
new, structural concept has emerged to describe kallikreins. From accumulated sequence data, 
it is now clear that the mouse has many genes with high homology to kallikrein coding 
sequences (19-20). Richard and co-workers have contributed to the concept of a " kallikrein 
multigene family" to refer to these genes (21-22). This definition is not based much on specific 
enzymatic function of the gene product, but more on its sequence homology and their close 
linkage on mouse chromosome 7. In humans, only KLKl meets the functional definition of a 
kallikrein. KLK2 has trypsin-like enzymatic activity and KLK3 (PSA) has very weak 
chymotrypsin-like enzymatic activity. These activities of KLK2 and KLK3 are not known to 
liberate biologically active peptides from precursors. Based on the newer definition, members 
of the kallikrein family include, not only the gene for the kallikrein enzyme, but also genes 
encoding other homologous proteases, including the enzyme that processes the precursors of the 
nerve growth factor and epidermal growth factor (8). 'Therefore, it is important to note the clear 
distinction between the enzyme kallikrein and a kallikrein or a kallikrein-like gene . 
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In order to test the accuracy of the computer programs, known genomic areas containing 
the PSA, Zyme and KLK2 genes were tested. Two of these programs (Grail 2 and GeneBuilder) 
were able to detect about 95% of the tested known genes (data not shown). Matches with 
expressed sequence tog sequences (EST) can also be employed for gene structure prediction in 
the GeneBuildet piogram and this can significanUy improve the power of the program especially 
at high stringency (e.g. >95% homology). 

In mouse, ten of the kalUkrein genes appear to be pseudogenes (9). Two of ther new 
genes do not show homology with the kallikrein genes (UG-1 and UG-2). However, one of 
them (UG-2) is related to mouse myelin associated glycoprotein (Table 12). There may still be 
an association between UG-2 and the kallikrein genes since some mouse kallikrcins are related 
to nerve growth factor, as discussed earlier (8) and Zyme was found to be highly expressed in 
brain tissue and is claimed to be related to Alzheimer's disease (11). 

Having illustrated and described the principles of the invention in a preferred 
embodiment, it should be appreciated to those skilled in the art that the invention can be 
modified in arrangement and detail without departure from such principles. All modifications 
coming within the scope of the following claims are claimed. 

AU publications, patents and patent applications referred to herein are4ncorporated by 
reference in their entirety to the same extent as if each individual publicationj^patent or patent 
application was specifically and individually indicated to be incorporated by reference in its 
entirety. 
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Table 1. Exon or gene prediction-prGgFams used^in this s^^ 



No. 

1 


Program name 

GeneBuilder (gene 
prediction) 


Source 

Institute oi Aavancea 
Biomedical Technologies 


Website or e-mail address 

httn7/125 itba mi cnr-itZ—we 
bgene/genebuilder.html 


2 


GeneB\iilder(exon 
prediction) 


Institute of Advanced 
Biomedical Technologies 


http!//125.itbfl.mi.cnT.iiy-'We 
b gene/genebuildeiJsfiBl 


3 


ORF gene 


Institute of Advanced 
Biomedical Technologies 


^ttp-//12^ itba.mi-cnr.it/'-we 
h£ene/wwworfpene2 .html 


4 

m 

Q 


GENEID-3 


BioMolecular Engineering 
Research Center, Boston 
University 


http://apoln.iTnim-es/geneid. 
html 

(peneidrad^nvin.bu,edii> 
http ://compbio.oml ,g03£ 


L* 5 

ry 6 


Grail 2 
FGENEH 


Oak Ridge National Laboratory 
Baylor College of Medicine, 
Houston^ Texas 


http!//mcrb.bcm.tmc,edu 



2l. In the final analysis of the sequences we used programs 1, 2, 4 and 5 only^ 

s 

UJ 



Table 2. Predicted exons of the putative gene KLK-Ll . The translated protein sequences 
of each exon (open reading frame) are shown 



Exon Putative coding region No. of 
No. bases 



Translated protein sequence 



Exon 
prediction 
program* 



1 



^^263^^^ ^2425^ 162 SLVSGSCSQIINGEDCSPHSQPWgAAL VMENELFCSGV A,b,U 



LVHPQWVLSAAHCFQ 



2847 



3109 



-oTo SYnGLGLHSLEADQEPGSQM VEASLSVRHFbVNKPLL A,B,C,D 

ANDLMLIKLDESVSESDTIRSISIASQCPTAGNSCLVSG 
WGLLANGELT 



3132 



3317 



roc EVLCPVAGADPELCVPGRMP TVLQCVNVSVVSEEVCS A,B,C,D 

KT vnPT.YHPSMFCAGGGQPQKDSCN 



m4 



4588 



4737 



149 gdsggplicngylqglvspukapcgqvgvpgvy™li; AJB.C 

KFTEWIEKTVQAS _ 



71 



ANLMTTAETPCTRLLGY VVHN V r 



S 12,536 12,6U7 ^ 

h] 1 . Nucleotide numbers refer to the related contig (see text and tigure i;. 

S 2. A = GeneBuilder (gene analysis), B = GeneBuilder (exon analysis), C = Grail 2, 
h D = GENEID-3 
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Table 3. Predictedexons^of the^putative gene KLK-L2. The^translated protein sequences 
of each axon (open reading frame) are shown 

Exon Putative coding region' No. of Translated protein sequence 

No. From(bp) To(bp). bases ^ro^^ 

-J 2383 2532 149 pnhkrrglqpalaalaggtghgkrtvllgrpgasavg b^d 

A VSRTLFPEVS AE 

"9 5045 3205 262 SYTIGLGLHSLEADQEPGSQMVEASLSVRHPEYNRPLL A,B,C,D 

ANDLMLIKLDESVSESDTIRSISIASQCPTAGNSCLVSG 

WGLLANGELT 

^228 3413 185 EVLCPVAGADPELeVPGRMPTVLQCVNVSWSEEVCS A,B,C,D 

KLYDPLYHPSMFCAGGGHDQKDSCN 

-7 4gg5 4g35 149 GDSGGPLICNGYLQGLVSFGKAPCGQVGVPGVYTNLC A3,C 

2 KFTEWIEK TVQAS 

Hi . Nucleotide numbers refer to the related contig (see text and figure 1). 

^2. A = GeneBuilder (gene analysis), B = GeneBuilder (exon analysis), C = Grail 2, 
ffi D = GENEID-3 



Table 4. Predicted exons of the putative gene KLK-L3. The translated protein sequences 
of each exon (open reading frame) are shown 



*-£xon Putative coding sequence No. of 
No. Froin(bp) To(bp) bases 



Translated protein sequence 



Exon 

prediction 

program^ 



^1 
3 



15,916 15,980 
16,329 16,394 



64 



DYVIVAECDVMDARICDRVTT 



B,C 



65 



MGVSAELRACLAVTAWRVVLG 



A,B,C,D 



17 QAd 15? 559 HVLAIWDVSCDHPSNTVPSGSNQDLGAGAGEDARSDD A,B,C,D 

i/yyyj^ io,iuj c,o™TTXTr70T^r'T-*xxiTTi^i>nrrkA AT 1 T PPMOT vrnA VI.VHP 



SSSRHNGSDCDMHTQPWQAALLLRPNQLYCGAVLVHP 
QWLLTAAHCRK 



-T Tq-qaa ToTSQ 215 VYESGQQMFQGVKSIPHPGYSHPGHSNNLMLIKLNRKJ C,D 

4 15,^^^ T^pj^yRpn^SSHCPSAGTKCLVSGWGTTKSPQ 



-J9245 19378 133 HFPKVLQCLNISVLSQKRCEDAYPRQIDDTMFCAGDKA B,C 



24,232 24,384 



152 



GR DSCQ ^ 

GDSGGPVVCNGSLQGLVSWGDYPCARPNRPGVYTNLC A,B,C 

KFTKWIQETIQANS 



4^ 



25,286 25,410 



124 



LTFQEWKTDNKERDNWEAKAGESQVQEIETILANMVK B,C 
PPLY ^ 



i2 

hi 



Nucleotide numbers refer to the related contig (see text and figure 1). 

A = GeneBuilder (gene analysis), B == GeneBuilder (exon analysis), C - Grail 2, 
D = GENEID-3 




Table 5. Predicted exons of the putative gene KLK-L4. The translated'-protein sequences, 
of each exon (open reading frame) are shown 



Exon Putative coding region' 
No. From(bp) To(bp) 


No. of 
bases 


Translated protein sequence 


Exon 

prediction 

program^ 


1 427 


488 


61 


FQFPEAPQALVQEEKEEEQE 


B,C 


. 2 961 


1038 


77 


LTMGRPRPRAAKTWMFLLLLGGAWA 


B,D 


3 1878 


2140 


262 


KYTVRLGDHSLQNKDGPEQBIPVVQSIPHPCYNSSDVE 
DHNHDLMLLQLRDQASLGSKVKPISLADHCTQPGQKC 

TVSGWGTVTSPR 


A,B,C 


4 4252 


4385 


133 


NFPDTLNCAEVKIFPQKKCEDAYPGQITDGMVCAGSSK. 
GADTCO 




= 5 5922 


6074 


152 


GDSGGPLVCDGALQGITSWGSDPCGRSDKPGVYTNICR 
YLDWnCKIIGSKG 


A,B.C 


C", 1. Nucleotide numbers refer to the related contig (see text and figure 1). 



J 2. A = GeneBuilder (gene anaIysis)^B = GeneBuilder (exon analysis), C - Grail 2, 
b D - GENEID-3 



Table 6. Predicted axons of the putative gene KLK-L5. The translated protein sequences 
of each exon (open reading frame) are shown 

Exon Putative coding region' nTS? Translated protein sequence ' Stil 

No. From(bp) To(bp) bases program^ 



_ =jr^^3 fTT VHFFTPINHRGGPMEEEGDGM AYHKEALUAUCTFQDP A,B,C.D 

4 SSz Ifi^ 189 CSSLTPLSLIPTPGHGWADTRAIGAHE CRPNSQPWQAOL A.B.CJ.D 

2 70,764 70,962 l»y fht.trlfcGA TLISDRWLLTAAHCRK 

^5 .rrrPi TTfiRT 265 LTSBACPSRYLWVRLGEHHLWKWEGPEQLmviUM-i' A.B,C,D 

3 73,422 73,687 265 |^gpj^lSANDHNDDIMLIRLPRQARLSPAVQPLNLS 

OTCVSPGM OCLISGWGAVSSPK ^ 

__ _____ ^ LFPVTLQCANISILENKLCHWAyPGHlSUSMLCAGLWE A,B,C,D 

™ ' GGRGSCQ 



Hi. Nucleotide numbers refer to the related contig (see text and figure 1). 

%. A = GeneBuilder (gene analysis), B = GeneBuilder (exon analysis), C = Grail 2, 
51 D = GENEID-3 



O 

! . i 
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Table 7. Predieted-exons of the putative gene KLK-L6. The translated protein sequences 
of each exon (open reading frame) are shown 

Exon Putative coding region' No. of bases Translated protein sequence Exon 

No. From(bp) To(bp) predict! 

program 

1 50 687 50 842 155 GDSGGPLVCGGVLQGLVSWGSVGPCGQDGIPGVY C 
' * TYICKYVDWIRMIMRNN 

2 53,550 53>607 57 aatavsaatgppepqpqs A,B 

"3 55 350 55 510 160 LVGGETRinCGFECKPHSQPWQAALFEKTKLLCGA QD 

* [ TLIAPRWLLTAAHCLKP 

"4 56 917 57 053 299 RYIVHLGQHNLQKEEGCEQTRTATESFPHPGFNNS A3,C, 

' ' LPNKDHRNDIMLVKMASPVSITWAVRPLTLSSRCV 

' TAGTSCLISGWGSTSSPQCRSTRGEPGRG 

OTs 57 57597 136 RLPHTLRCANITIIEHQKCENAYPGNITDTMVCASV A3,C, 

" ' ' QEGGKDSCQ 



t • : 



"6 57 448 57 597 149 GDSGGPLVCNQSLQGIISWGQDPCAITRKPGVYTK A,B,C 

VCKYVDWIQE TMKNN 



m 1. Nucleotide numbers refer to the related contig (see text and figure 1). 

01 

O 2. A = GeneBuilder (gene analysis), B = GeneBuilder (exon analysis), C = Grail 2, 
D = GENEID-3 



^>Table 8. Predicted exons of the putative gene KLK-L7. The translated protein sequences 
Hof each exon (open reading frame) are shown 

^ Exon Putative coding region' nST^^ Translated protein sequence prediction 

No. From(bp) To(bp) bases program^ 



,y^-T^ ^^798 268 GLKVYLGKHALGRVEAGEQVREVVHSlPHPliYKRSPIH A,B,C.D 

1 25.460 23, /2» Z06 lnhdhdimllelqspvqltgyiqtlplshnnrltpgttc 

RVSGWGTTTSPQ 



"25879 27015 136 NYPKTLQCANIQLRSDEECRQVYPGKITDNMLCAGTKli A,li,C;.D 

' * GGKDS ^ 

Worn 185 GDSGGPLVCNRTLYGlVSWGDFPCGQPDRPUVYiK.vj> A,B,C 

28,/ /» Z6,^yOJ loj RWL WIRETIRKYETQQQKWLKGPQ 

-3Y06I 3081 120 bSSGYSMGNRLRGPSEEGTALAQAKGDUSwmCVMAV C,D 



i 1 Nucleotide numbers refer to the related contig (see text and tigure 1). 

2. A = GeneBuilder (gene analysis), B = GeneBuilder (exon analysis), C = Grail 2, 
j D = GENEID-3 
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Table 9. Predieted exons of the putative gene BCLK-L8. The translated-protem sequences- 
of each exon (open reading frame) are shown 



Exon 
No. 


Putative coding region 
From(bp) To(bp) 


No. of 
bases ' 


1 


1588 


1747 


159 


2 


3576 


3851 


275 


3 


4806 


4939 


133 




6023 


6068 


45 



Translated protein sequence Exon 

prediction 

program^ 

LSQAATPKIFNGTECGRNSQPWQVGLFEGTSLRCGUVL A,B,C 
IDHRWVL TAAHCSG 



VATGSRYWV RLGEHSLSQLDWTEQIRHSGFSVTHPGY A3,C,D 
LGASTSHEHDLRLLRLRLPVRVTSSVQPLPLPNDCATA 

GTECHVSGWGITNHPR 



PFPDLLQCLNLSIVSHATCHGVYPGRITSNMVCAGGVP A3,C,D 
GQDACQ 



MSRALVLQGREFYSE 



a 1. Nucleotide numbers refer to the related contig (see text and figure 1). 

S 2. A = GeneBuilder (gene analysis), B = GeneBuilder (exon analysis), C = Grail 2, 
^ D=GENEH!>-3 



Table 10. Predicted exons of the unknown gene UG-1. The translated protein sequences 
of each exon (open reading frame) are shown 



Exon Putative coding region' N^Tof Translated protein sequence ^red^ction 

No. From(bp) To(bp) bases pro^^" 

ZTTlT 43 194 77 AVEHKEAGTQS GNLQVFWPGWCGQA A,C 

-4 TTZ^ 4T607 174 RKQELMKHSSVMCQTQVSVNNQTQVSVNKMIKTPED A,B,C 

J. HJ.ou/ AAFFEPSFLEGKKTTELCCSHL 



ALGIRSKFFCLALSGIIFR C,D 



~3 45 873 45932 59 
"4 47J004 47^,111 107 EKSAV KKGTSQGDTDSKQRSWDSRHVAGPLQEEIK A,C 

1. Nucleotide numbers refer to the related contig (see text and figure 1). 

S2. A = GeneBuilder (gene analysis), B = GeneBuilder (exon analysis), C = Grail 2, 
pj D = GENEID-3 

rU 
m 




Table 11. Predicted exonS'ofthe-unknown gene- UG-2. The translated prGtein^sequenees ^. 
of each exon (open reading frame) are shown 



Exon 
No. 


Putative coding region* 
From(bp) To(bp) 


No. of 
bases 


Translated protein sequence 


Exon 
prediction 

^fi v/gi axil 


1 


39,721 


39,767 


46 


LTQPPEIHVQKSCNQ 


A,B,C 


2 


44,129 


44,641 


512 


PPLSLEPAVPERRTLRNRRSLAALAPLTPDMLLLLLPLL 
WGRERAEGQTSKLLTMQSSVTVQEGLCVHVPCSFSYPS 
HGWryPGPWHGYWFREGANTDQDAPVATNNPARAV 
WEETEU)RFHLLGDPHTKNCTLSIRDARRSDAGRyFFRM 
EKGSnCWNYKHH RLSVNVT' 


B.C 


3 

yi 

113 


44,843 


45,121 


278 


LTHRPNILIPGTLESGCPQNLTCSVPWACEQGTPPMISWI 
GTSVSPLDPSTTRSSVLTLIPQPQDHGTSLTCQVTFPGAS 
VTTNKTVHLhJVS 


A,B,C,D 




45,327 


45,374 


47 


PPQNLTMTVFQGDGT 


A,B,D 




46,318 


46,542 


224 


GQSLRLVCAVDAVDSNPPARLSLSWRGLTLCPSQPSNP 
GYLELPWVHLRDAAEFTCRAQNPLGSQQVYLNVSLQ 


A,B,C 


me 


47,195 


47,285 


185 


KATSGVTQGVVGGAGATALVFLSFCVEFV 


A,B,C,D 


mi 


49,136 


49,322 


185 


GPLTEPWAEDSPPDQI?PPASARSSVGEGELQYASLSFQ 
MVKPWDS RGQEATDTEYSEIKIHR 


A,B,C 


s 1 . Nucleotide numbers refer to the related contig (see text and^figure 1). 




W2. A 
^ D 


= GeneBuilder (gene analysis), B 
= GENEID-3 


GeneBuilder (exon analysis)5rC = Grail*2,*„ 





Table 12 . Homology between the predicted amino acid sequences of the newly 

ide ntified putative genes and protein sequences deposited in GenBank 

— Gene identity Homolgous known protein Identity% 

^ (number of 

amino acids) 



1 


KLK-Ll 


• Human stratum coraeum chymotiyptic enzyme 

• Rat glandular kaliikrein 1 0 

• Mouse glandular kaliikrein K 1 1 

• Human prostatic specific antigen 

• Human glandular kaliikrein 


47 (83/176) 
45 (81/182) 
45 (77/172) 
44 (74/168) 
44 (71/160) 


2 


KLK-L2 


• Human stratum comeum chymotryptic enzyme 

• Rat trypsinogen V-B 

• Rat glandular kaliikrein k2 

• Mouse glandular kaliikrein K2 

• Human trypsinogen I 


47 (104/220) 
47 ( 95/201) 
45 ( 74/165) 
43 ( 97/223) 
42 ( 95/227) 


3 


KLK-L3 


• Hximan glandular kaliikrein k2 

• Rat glandular kaliikrein 10 

• Human trypsinogen I 

• Human stratum comeum chymotryptic enzyme 

• Rat glandular kaliikrein kl 

• Human prostatic speciiic anugcu 


57 (72/126) 
55 (72/131) 
54 (69/127) 
52 (74/141) 
50(69/137) 
48 (84/174) 


4 


KLK.L4 


• Rat trypsinogen II 

• Mouse trypsinogen 

• Human trypsinogen I 

• Human glandular kaliikrein 

• Mouse glandular kaliikrein Kl 

• Human prostatic specific antigen 


47 (88/168) 
47 (93/200) 
46(87/188) 
46 (88/190) 
45 (87/194) 
41(152/373) 


5 


KLK-L5 


• Human glandular kaliikrein 

• Pig trypsin 

• Human prostatic specific antigen 

• Rat glandular kaliikrein 8 

• Rat trypsinogen II 


46(133/291) 
44(134/302) 
43(129/297) 
42(130/307) 
42(131/279) 


6 


KLK-L6 


• Rat glandular kaliikrein kl 

• Rat trypsinogen II 

• Mouse trypsinogen 

• Human trypsinogen H 

• Human glandular kaliikrein I 

• Human prostatic specific antigen 


53(133/252) 
52(146/283) 
49(113/231) 
47(139/297) 
47(122/259) 
47(116/249) 


7 


KLK-L7 


• Rat trypsinogen III 

• Mouse trypsinogen 

• Mouse glandular kaliikrein Kl 

• Human trypsinogen I 

• Human prostatic specific antigen 


54 (88/163) 
51 (95/188) 
48 (80/168) 
45 (92/205) 
45 (92/204) 




No. 


Gene identity 




Homolgous known protein 


Identity% 

(number of 
amincacids)^ 


8 


KLK-L8 


• 
• 
• 
• 
• 


Rat glandular kallikrein 7 
Rat trypsinogen I 
Pig trypsin 

Human stratum comeum chymotryptic enzyme 
Mouse glandular kallikrein Kl 


42 (57/136) 
41 (56/138) 
40 (55/139) 
40 (52/131) 
36 (49/135) 


9 


UG-1 


• 


Human zinc finger protein 9 1 


43 (34/ 80) 


10 


UG-2 


• 
• 


Human myeloid cell surface antigen CD33 
Mouse myelin associated glycoprotein 


59(128/216) 
38 (53/141) 



We Claim; 



1 . An isolated nucleic acid molecule which comprises: 

(i) a nucleic acid sequence encoding a protein having substantial sequence identity 
preferably at least 70% sequence identity, with an amino acid sequence of KLX- 
L1-KLK-L8 as shown in Tables 2 to 9; 

(ii) a nucleic acid sequence encoding a protein comprising with an amino acid 
sequence of KLK-L1-KLK-L8 as shown in Tables 2 to 9; 

(iii) nucleic acid sequences complementary to (i); 

(iv) a degenerate form of a nucleic acid sequence of (i); 

(v) a nucleic acid sequence capable of hybridizing under stringent conditions to a 
nucleic acid sequence in (i), (ii) or (iii); 

(vi) a nucleic acid sequence encoding a truncation, an analog, an allelic or species 
variation of a protein comprising with an amino acid sequence of KLK-Ll- 
KLK-L8 as shown in Tables 2 to 9; or 

(vii) a fragment, or allelic or species variation of (i), (ii) or (iii). 



46- 



ABSTR ACT OF THF . niSCLOSURE 

The invention relates to nucleic acid molecules, proteins encoded by such nucleic acid 
molecules; and use of the proteins and nucleic acid molecules 

5 
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